首页> 外文OA文献 >Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches

【2h】

Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches

机译：高性能计算生物学中容错的自动化使用多代理方法的作业

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: Large-scale biological jobs on high-performance computing systemsrequire manual intervention if one or more computing cores on which theyexecute fail. This places not only a cost on the maintenance of the job, butalso a cost on the time taken for reinstating the job and the risk of losingdata and execution accomplished by the job before it failed. Approaches whichcan proactively detect computing core failures and take action to relocate thecomputing core's job onto reliable cores can make a significant step towardsautomating fault tolerance. Method: This paper describes an experimental investigation into the use ofmulti-agent approaches for fault tolerance. Two approaches are studied, thefirst at the job level and the second at the core level. The approaches areinvestigated for single core failure scenarios that can occur in the executionof parallel reduction algorithms on computer clusters. A third approach isproposed that incorporates multi-agent technology both at the job and corelevel. Experiments are pursued in the context of genome searching, a popularcomputational biology application. Result: The key conclusion is that the approaches proposed are feasible forautomating fault tolerance in high-performance computing systems with minimalhuman intervention. In a typical experiment in which the fault tolerance isstudied, centralised and decentralised checkpointing approaches on an averageadd 90% to the actual time for executing the job. On the other hand, in thesame experiment the multi-agent approaches add only 10% to the overallexecution time.

机译：背景：如果高性能计算系统上执行的一个或多个计算核心发生故障，则需要人工干预。这不仅在维护工作上付出了成本，而且在恢复工作上花费了时间，并在工作失败之前损失了数据和执行完成的风险。可以主动检测计算核心故障并采取措施将计算核心的工作重新定位到可靠核心上的方法，可以朝着实现自动容错迈出重要的一步。方法：本文描述了使用多主体方法进行容错的实验研究。研究了两种方法，第一种在工作级别，第二种在核心级别。研究了针对在计算机集群上执行并行约简算法时可能发生的单核故障情况的方法。提出了第三种方法，该方法在工作和核心级别都融合了多代理技术。实验是在流行的计算生物学应用基因组搜索的背景下进行的。结果：关键结论是，所提出的方法对于以最少的人为干预自动执行高性能计算系统中的容错性是可行的。在研究容错能力的典型实验中，集中式和分散式检查点方法平均使执行作业的实际时间增加了90％。另一方面，在同一实验中，多主体方法仅使总执行时间增加了10％。

著录项

作者
Varghese, Blesson; McKee, Gerard; Alexandrov, Vassil;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Automating fault tolerance in high-performance computational biological jobs using multi-agent approaches [J] . Blesson Varghese, Gerard McKee, Vassil Alexrov Computers in Biology and Medicine . 2014,第Null期

机译：使用多智能体方法自动执行高性能计算生物作业中的容错
2. Fault Tolerance Approach To Improve Performance Computation Of Biological Jobs Using Cloud Computing. [J] . P Padmakumari Research Journal of Pharmaceutical, Biological and Chemical Sciences . 2016,第2期

机译：使用云计算改进生物作业性能计算的容错方法。
3. Fault Tolerance Approach To Improve Performance Computation Of Biological Jobs Using Cloud Computing. [J] . E. KADIVAR, Kh. RAHIMI, M. A. SHAHZAMANIAN Research Journal of Pharmaceutical, Biological and Chemical Sciences . 2016,第2期

机译：使用云计算改进生物作业性能计算的容错方法。
4. Adaptive and automated fault-tolerance for multi-agent systems [C] . Singh Aarti, Juneja Dimple, Sharma A.K. 2011 IEEE International Conference on Computer Science and Automation Engineering . 2011

机译：多主体系统的自适应和自动容错
5. Fault Tolerance for Iterative Methods in High-Performance Computing [D] . Tao, Dingwen. 2018

机译：高性能计算中迭代方法的容错
6. An improved ant colony optimization algorithm with fault tolerance for job scheduling in grid computing systems [O] . Hajara Idris, Absalom E. Ezugwu, Sahalu B. Junaidu, -1

机译：网格计算系统中一种具有容错能力的蚁群优化算法
7. Automating fault tolerance in high-performance computational biological jobs using multi-agent approaches [O] . Varghese, Blesson, McKee, Gerard, Alexandrov, Vassil 2014

机译：使用多智能体方法自动执行高性能计算生物作业中的容错

Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches

摘要

著录项

相似文献

相关主题

期刊订阅